Cell and Tumor Classifications Using Gene Expression Data: Forests versus Trees
نویسندگان
چکیده
The advancement of gene chips has unveiled a promising technology for cell, tumor and cancer classifications. We exploit the use of classification trees in tumor and cell classifications through gene expressions. To improve the classification and prediction accuracy, we introduce a deterministic procedure to form a forest and make comparisons to other existing alternatives. We also explore the use of two impurities rather than one impurity. Using three published and commonly used data sets, we found that the deterministic forest with two impurities outperforms other forests and single trees. In addition, we provided graphical presentations to understand our results and justified our findings with a literature search. Introduction The advancement of gene chips has unveiled a promising technology for tumor and cancer classifications. The classic approach does not discriminate among tumors with similar histopathologic features, which may vary in clinical course and in response to treatment. The classification and diagnosis based on gene expression profiles may provide more information than classic morphology and identify pathologically different tumor types. Such additional information can be critical to the cancer treatment. Many investigators have exploited a variety of analytic methods to derive accurate classification criteria for tumor and cancer using microarray data. We present a tree-based classification technique, which is intuitively appealing, easy to interpret, and can produce highly accurate classification rules. While this technique holds a great promise, microarray data present an unprecedented challenge in the sense that we have a far fewer number of samples than the number of studied genes. As a result, we face a situation in which many classification rules are statistically indistinguishable, but may have different medical implications. Our key idea is to take advantage of this pool of rich, competitive trees and to predict a class collectively based on all of the trees in the forest. Although some trees in the forest contain clinically irrelevant genes, it is reasonably to believe that the relevant genes are likely to be a part of the forest and perhaps most trees in the forest are biologically relevant. We demonstrate this process of reasoning through the analyses of two published data sets. It is known in the field of machine learning that the use of forest improves the classification accuracy over a single tree. However, forests are commonly formed through a random process, which carries a certain degree of uncertainty and may result in difficulty for interpreting the forests. Our objective is to introduce a deterministic process for constructing a forest with improved quality and interpretation. Because the process is deterministic, the forest is reproducible and the genes appearing in the forest can be easily identified for further experiments and analyses. Method and Result The Growing of a Single Tree First, we briefly describe how a single tree can be grown. Recursive partitioning is the thrust of tree growing. Suppose we have data from n units of observations (e.g., samples). Each unit contains a vector of feature measurements (e.g., gene expression levels) and a class label (e.g., normal or cancer). Recursive partitioning is a technique that builds a classification rule to predict the class membership based on the feature information by extracting homogeneous strata from the sample. For example, in Figure 1, the entire sample (the circle on the top, also called the root node) of 72 cells is split into two sub-groups, which are called daughter nodes. The choices of the selected predictor and its corresponding cut-off value are designed to purify the distribution of the response; namely, separating different tissues from each other. The node purity can be measured by the Gini index, defined as i j
منابع مشابه
Cell and tumor classification using gene expression data: construction of forests.
The advent of gene chips has led to a promising technology for cell, tumor, and cancer classification. We exploit and expand the methodology of recursive partitioning trees for tumor and cell classification from microarray gene expression data. To improve classification and prediction accuracy, we introduce a deterministic procedure to form forests of classification trees and compare their perf...
متن کاملDown-regulation of HSP40 gene family following OCT4B1 suppression in human tumor cell lines
Objective(s): The OCT4B1, as one of OCT4 variants, is expressed in cancer cell lines and tissues more than other variants and plays an important role in apoptosis and stress (heat shock protein) pathways. The present study was designed to determine the effects of OCT4B1 silencing on expressional profile of HSP40 gene family expression in three different human tumor cell lines. Materials and Met...
متن کاملCD44 expression changes and increased apoptosis in MCF-7 cell line of breast cancer in simulated microgravity condition
Introduction: Studies have shown that simulated microgravity (SMG) affects tumor cell proliferation and metastasis. However, the underlying mechanism and its molecular basis are still not well known. In recent years, due to the role of CD44 in breast cancer and its high expression in invasive basal tumors, it has been the subject of extensive research. There is a conflicting data on the role of...
متن کاملThe Effect of Wild Type P53 Gene Transfer on Growth Properties and Tumorigenicity of PANC-1 Tumor Cell Line
The p53 protein function is essential for the maintenance of the nontumorigenic cell phenotype. Pancreatic tumor cells show a very high frequency of p53 mutation. To determine if restoration of wild type p53 function can be used to eliminate the tumorigenic phenotype in these cells, pancreatic tumor cell lines, PANC-1 and HTB80, differing in p53 status were stably transfected with exogenous wil...
متن کاملSTAT3 as a Key Factor in Tumor Microenvironment and Cancer Stem Cell
Background Recent studies revealed that tumor-associated macrophages (TAMs) play a decisive role in the regulation of tumor progression by manipulating tumor oncogenesis, angiogenesis and immune functions within tumor microenvironments. Signal transducer and activator of transcription 3 (STAT3), which is a point of convergence for numerous oncogenic signalling pathways, is constitutively activ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002